机器学习(ML)算法在帮助不同学科和机构的科学社区解决大型和多样化的数据问题方面表现出了增长的趋势。但是,许多可用的ML工具在编程方面要求且计算成本高昂。 MlexChange项目旨在建立一个配备有能力工具的协作平台,该平台使科学家和设施使用者没有深刻的ML背景来使用ML和计算资源进行科学发现。在高水平上,我们针对完整的用户体验,在该体验中,可以通过Web应用程序可以轻松获得管理和交换ML算法,工作流和数据。到目前为止,我们已经构建了四个主要组件,即中央职位管理器,集中式内容注册表,用户门户和搜索引擎,并成功地将这些组件部署到了测试服务器上。由于每个组件都是一个独立的容器,因此可以轻松地在不同尺度的服务器上部署整个平台或其个人服务,从笔记本电脑(通常是单个用户)到高性能群集(HPC)(同时)通过许多用户。因此,MlexChange使用方案使灵活性变得灵活 - 用户可以从远程服务器访问服务和资源,也可以在其本地网络中运行整个平台或其个人服务。
translated by 谷歌翻译
研究人员高度利用了原位同步加速器高能X射线粉末衍射(XRD)技术,可以分析功能设备(例如电池材料)或复杂样品环境中材料的晶体结构反应堆)。材料的原子结构可以通过其衍射模式以及详细的分析(例如Rietveld的细化)来识别,该分析表明测量的结构如何偏离理想结构(例如内部应力或缺陷)。对于原位实验,通常在不同条件下(例如绝热条件)在同一样本上收集一系列XRD图像,产生不同的物质状态,或者简单地作为时间的时间连续收集,以跟踪样品的变化超过化学或物理过程。原位实验通常与区域探测器一起进行,收集由理想粉末的衍射环组成的2D图像。根据材料的形式,人们可能会观察到除现实样本及其环境的典型Debye Scherrer环以外的其他特征,例如纹理或优选方向以及2D XRD图像中的单晶衍射点。在这项工作中,我们介绍了对机器学习方法的研究,以快速可靠地识别XRD图像中的单晶衍射点。在XRD图像整合过程中排除伪影的排除允许精确分析感兴趣的粉末衍射环。我们观察到,当用高度多样的数据集对较小的子集进行训练时,梯度提升方法可以始终如一地产生高精度的结果。与常规方法相比,该方法大大减少了识别和分离单晶斑所花费的时间。
translated by 谷歌翻译
Due to the environmental impacts caused by the construction industry, repurposing existing buildings and making them more energy-efficient has become a high-priority issue. However, a legitimate concern of land developers is associated with the buildings' state of conservation. For that reason, infrared thermography has been used as a powerful tool to characterize these buildings' state of conservation by detecting pathologies, such as cracks and humidity. Thermal cameras detect the radiation emitted by any material and translate it into temperature-color-coded images. Abnormal temperature changes may indicate the presence of pathologies, however, reading thermal images might not be quite simple. This research project aims to combine infrared thermography and machine learning (ML) to help stakeholders determine the viability of reusing existing buildings by identifying their pathologies and defects more efficiently and accurately. In this particular phase of this research project, we've used an image classification machine learning model of Convolutional Neural Networks (DCNN) to differentiate three levels of cracks in one particular building. The model's accuracy was compared between the MSX and thermal images acquired from two distinct thermal cameras and fused images (formed through multisource information) to test the influence of the input data and network on the detection results.
translated by 谷歌翻译
Rapid advancements in collection and dissemination of multi-platform molecular and genomics data has resulted in enormous opportunities to aggregate such data in order to understand, prevent, and treat human diseases. While significant improvements have been made in multi-omic data integration methods to discover biological markers and mechanisms underlying both prognosis and treatment, the precise cellular functions governing these complex mechanisms still need detailed and data-driven de-novo evaluations. We propose a framework called Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data (fiBAG), that allows simultaneous identification of upstream functional evidence of proteogenomic biomarkers and the incorporation of such knowledge in Bayesian variable selection models to improve signal detection. fiBAG employs a conflation of Gaussian process models to quantify (possibly non-linear) functional evidence via Bayes factors, which are then mapped to a novel calibrated spike-and-slab prior, thus guiding selection and providing functional relevance to the associations with patient outcomes. Using simulations, we illustrate how integrative methods with functional calibration have higher power to detect disease related markers than non-integrative approaches. We demonstrate the profitability of fiBAG via a pan-cancer analysis of 14 cancer types to identify and assess the cellular mechanisms of proteogenomic markers associated with cancer stemness and patient survival.
translated by 谷歌翻译
Recent increases in the computational demands of deep neural networks (DNNs) have sparked interest in efficient deep learning mechanisms, e.g., quantization or pruning. These mechanisms enable the construction of a small, efficient version of commercial-scale models with comparable accuracy, accelerating their deployment to resource-constrained devices. In this paper, we study the security considerations of publishing on-device variants of large-scale models. We first show that an adversary can exploit on-device models to make attacking the large models easier. In evaluations across 19 DNNs, by exploiting the published on-device models as a transfer prior, the adversarial vulnerability of the original commercial-scale models increases by up to 100x. We then show that the vulnerability increases as the similarity between a full-scale and its efficient model increase. Based on the insights, we propose a defense, $similarity$-$unpairing$, that fine-tunes on-device models with the objective of reducing the similarity. We evaluated our defense on all the 19 DNNs and found that it reduces the transferability up to 90% and the number of queries required by a factor of 10-100x. Our results suggest that further research is needed on the security (or even privacy) threats caused by publishing those efficient siblings.
translated by 谷歌翻译
The highest grossing media franchise of all times, with over \$90 billion in total revenue, is Pokemon. The video games belong to the class of Japanese Role Playing Games (J-RPG). Developing a powerful AI agent for these games is very hard because they present big challenges to MinMax, Monte Carlo Tree Search and statistical Machine Learning, as they are vastly different from the well explored in AI literature games. An AI agent for one of these games means significant progress in AI agents for the entire class. Further, the key principles of such work can hopefully inspire approaches to several domains that require excellent teamwork under conditions of extreme uncertainty, including managing a team of doctors, robots or employees in an ever changing environment, like a pandemic stricken region or a war-zone. In this paper we first explain the mechanics of the game and we perform a game analysis. We continue by proposing unique AI algorithms based on our understanding that the two biggest challenges in the game are keeping a balanced team and dealing with three sources of uncertainty. Later on, we describe why evaluating the performance of such agents is challenging and we present the results of our approach. Our AI agent performed significantly better than all previous attempts and peaked at the 33rd place in the world, in one of the most popular battle formats, while running on only 4 single socket servers.
translated by 谷歌翻译
Unlike tabular data, features in network data are interconnected within a domain-specific graph. Examples of this setting include gene expression overlaid on a protein interaction network (PPI) and user opinions in a social network. Network data is typically high-dimensional (large number of nodes) and often contains outlier snapshot instances and noise. In addition, it is often non-trivial and time-consuming to annotate instances with global labels (e.g., disease or normal). How can we jointly select discriminative subnetworks and representative instances for network data without supervision? We address these challenges within an unsupervised framework for joint subnetwork and instance selection in network data, called UISS, via a convex self-representation objective. Given an unlabeled network dataset, UISS identifies representative instances while ignoring outliers. It outperforms state-of-the-art baselines on both discriminative subnetwork selection and representative instance selection, achieving up to 10% accuracy improvement on all real-world data sets we use for evaluation. When employed for exploratory analysis in RNA-seq network samples from multiple studies it produces interpretable and informative summaries.
translated by 谷歌翻译
In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter learns to embed the artifacts related to the camera internal and external processing by training only on real data in a self-supervised manner. Forgeries are detected as deviations from the expected regular pattern that characterizes each pristine image. Looking for anomalies makes the approach able to robustly detect a variety of local manipulations, ensuring generalization. In addition to a pixel-level localization map and a whole-image integrity score, our approach outputs a reliability map that highlights areas where localization predictions may be error-prone. This is particularly important in forensic applications in order to reduce false alarms and allow for a large scale analysis. Extensive experiments on several datasets show that our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works. Code will be publicly available at https://grip-unina.github.io/TruFor/
translated by 谷歌翻译
Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large version, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions -- deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (~50% for deletion intervention, and ~20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ~50% to ~6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.
translated by 谷歌翻译
From smoothly pursuing moving objects to rapidly shifting gazes during visual search, humans employ a wide variety of eye movement strategies in different contexts. While eye movements provide a rich window into mental processes, building generative models of eye movements is notoriously difficult, and to date the computational objectives guiding eye movements remain largely a mystery. In this work, we tackled these problems in the context of a canonical spatial planning task, maze-solving. We collected eye movement data from human subjects and built deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We found that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze. This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation.
translated by 谷歌翻译